-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use copy of state for state dumps to file #365
base: master
Are you sure you want to change the base?
Use copy of state for state dumps to file #365
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #365 +/- ##
=======================================
Coverage 98.66% 98.66%
=======================================
Files 77 77
Lines 9706 9707 +1
=======================================
+ Hits 9576 9577 +1
Misses 130 130 ☔ View full report in Codecov by Sentry. |
73e016c
to
209d561
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like the right approach, though I think you probably want a deep copy.
Also not sure how expensive an operation it is for PyDantic to copy the entire structure, which can grow quite large. If running against the entire set of Uninett routers, is there a noticable difference in the runtime of this function before and after this PR?
@@ -42,8 +42,9 @@ class ZinoState(BaseModel): | |||
def dump_state_to_file(self, filename: str): | |||
"""Dumps the full state to a file in JSON format""" | |||
_log.debug("dumping state to %s", filename) | |||
copied_state = self.model_copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may actually want a deep copy of the state to ensure nothing is being updated from another thread:
https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I try to use deep copy I get the following exception
2024-09-03 09:48:07,045 - ERROR - apscheduler.executors.default (asyncio_0) - Job "ZinoState.dump_state_to_file (trigger: interval[0:05:00], next run at: 2024-09-03 09:53:07 CEST)" raised an exception
Traceback (most recent call last):
File "/home/johanna/zino/.venv/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/home/johanna/zino/src/zino/utils.py", line 33, in wrapper
result = func(*args, **kwargs)
File "/home/johanna/zino/src/zino/state.py", line 45, in dump_state_to_file
copied_state = self.model_copy(deep=True)
File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 311, in model_copy
copied = self.__deepcopy__() if deep else self.__copy__()
File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 777, in __deepcopy__
_object_setattr(m, '__dict__', deepcopy(self.__dict__, memo=memo))
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
y = copier(memo)
File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 789, in __deepcopy__
deepcopy({k: v for k, v in self.__pydantic_private__.items() if v is not PydanticUndefined}, memo=memo),
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 206, in _deepcopy_list
append(deepcopy(a, memo))
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 238, in _deepcopy_method
return type(x)(x.__func__, deepcopy(x.__self__, memo))
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 206, in _deepcopy_list
append(deepcopy(a, memo))
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.10/copy.py", line 211, in <listcomp>
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.10/copy.py", line 211, in <listcomp>
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.10/copy.py", line 161, in deepcopy
rv = reductor(4)
TypeError: cannot pickle '_asyncio.Future' object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, that seems pretty weird... what could be putting a Future
object into this structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm investigating it - it is not FlappingStates
- I checked out a commit before it was added and the same error happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with PlannedMaintenances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have figured out that the problem lies with ZinoState.events
- I will investigate further which of them lead to problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then again, I'm not sure that this actually solves the issue at hand. I still get the occasional error about dicts changing size while Pydantic is iterating them to copy them, which was the reason for doing a copy to begin with. And, of course, this makes sense. The copy is being made in the same thread that is doing the dumping, so if the structure is being changed by another thread, the same problem will still occur.
The only way to ensure we don't get this error is to ensure that no other threads are concurrently modifying the data structure while it's being serialized to disk (or while it's being copied). This can be done by either making sure we're not running multiple threads, or by adding locks (which will complicate things quite a bit).
It could also be "worked around", by simply retrying the serialization several times if it fails due to these errors - which would be safe enough when #366 is merged, as that would protect us from leaving a corrupt state file when the error inevitably occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe the reason the dumping is happening in a separate thread is because the method dump_state_to_file()
is not declared as async. When this function is scheduled to run using APscheduler, apscheduler will automatically defer its execution to a separate worker thread because it is a non-async function and would block the entire event loop.
I think the safest way to run the function would be to declare it as an async
function, which lets it run in the main event loop thread. It will, of course, block the event loop while serializing - which could be ok if it's fast, disastrous if it is slow. A more complicated implementation would be to implement it as a function that copies the state in the main thread, and then defers the actual serialization of this copy to a worker thread (since the disk i/o is what would block the most).
Scope and purpose
Mentioned in #364.
Contributor Checklist
Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to Zino can be found in the
README.